Fix ArrowNotImplementedError in SalesScrutinyStudy with pyarrow >= 22#312
Open
drussellmrichie wants to merge 1 commit intolarsiusprime:masterfrom
Open
Conversation
When a DataFrame column has Arrow null dtype (all-null values with no known type), calling .astype(str) does not convert it to a string dtype with newer versions of pyarrow (observed with pyarrow 22 + pandas 2.3). Subsequent string concatenation with a large_string-typed column raises: ArrowNotImplementedError: Function 'binary_join_element_wise' has no kernel matching input types (null, large_string, large_string) Fix: go through Python object dtype first (.astype(object)) before .astype(str), and apply the same to model_group to ensure both sides of the concatenation are plain Python object strings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
|
Thank you for your contribution. I affirm that this contributor has signed the CLA Russell Richie seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
Owner
|
I affirm that this contributor has signed the CLA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a DataFrame column has Arrow
nulldtype (all-null values with noinferred type), calling
.astype(str)does not reliably convert it toa string dtype in newer pyarrow/pandas combinations (observed with
pyarrow 22 + pandas 2.3). Subsequent string concatenation with a
large_string-typed column then raises:This crashes
SalesScrutinyStudy.__init__for any model group.Fix
Go through Python object dtype first (
.astype(object)) before.astype(str)on bothss_idandmodel_group, bypassing the Arrowbackend for this string concatenation.
Reproduction
Install openavmkit with pyarrow >= 22 and pandas >= 2.3, then run
the sales scrutiny step on any locality. The crash occurs in
SalesScrutinyStudy.__init__at thess_idconstruction lines.